Lecture 6

Spatial Correlation and Variography

2024-11-19

What now?

We have already seen different simple interpolation methods that take space into account when estimating unknown values, mainly by incorporating the distance between locations of measured values and the prediction location.

Next, we will look at geostatistic interpolation methods that also take into account knowledge about how certain variables are spatially (auto)correlated.

First, let´s recap some basic concepts of correlation.

Random variables

\[ \newcommand{\E}{{\rm E}} % E expectation operator \newcommand{\Var}{{\rm Var}} % Var variance operator \newcommand{\Cov}{{\rm Cov}} % Cov covariance operator \newcommand{\Cor}{{\rm Corr}} \]

Random variables (RVs) are numeric variables whose outcomes are subject to chance.

The cumulative distribution of probability \(F_x(\cdot)\) over outcomes \(z\) over all possible values of the RV \(Z\) is the probability distribution function:

\[P(Z \le z) = F_Z(z) = \int_{-\infty}^z f_Z(u)du\] where \(f_Z(\cdot)\) is the probability density function of \(Z\). The sum of all probability is 1.

Random variables

Random variables have

Try to think of \(E(Z)\) as \(\frac{1}{n}\sum_{i=1}^{n} z_i\), with \(i \rightarrow \infty\).


Two random variables \(X\) and \(Y\) have covariance defined as \(\Cov(X,Y) = E[(X-E(X))(Y-E(Y))]\)

Correlation and covariance

Correlation is scaled covariance, scaled by the variances. For two variables \(X\) and \(Y\), it is \[\Cor(X,Y) = \frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\]

  • it is quite easy to show that \(|\Cov(X,Y)| \le \sqrt{\Var(X)\Var(Y)}\), so correlation ranges from -1 to 1

  • for this, note that \(\Cov(X,X)=\Var(X)\) and \(\Cov(X,-X)=-\Var(X)\).

It is perhaps easier to think of covariance as unscaled correlation.

Note: A large covariance does not imply a strong correlation

Expectation, variance, covariance, correlation

Summary:

  • Random variable: \(Z\) follows a probability distribution, specified by a density function \(f(z)= \Pr(Z=z)\) or a distribution function \(F(z)=\Pr(Z \le z)\)

  • Expectation: \(\E(Z) = \int_{-\infty}^{\infty} f(s)ds\) – center of mass, mean.

  • Variance: \(\Var(Z)=\E(Z-\E(Z))^2\) – mean squared distance from mean; measure of spread; square root: standard deviation of \(Z\).

  • Covariance: \(\Cov(X,Y)=\E((X-\E(X))(Y-\E(Y)))\) – mean product; can be negative; \(\Cov(X,X)=\Var(X)\).

  • Correlation: \(r_{XY}=\frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\) – normalized \([-1,1]\) covariance. -1 or +1: perfect correlation.

Correlation

What is spatial correlation?

Waldo Tobler’s first law in geography:

“Everything is related to everything else, but near things are more related than distant things.” [Tobler, 1970, p.236]*

  • But how then is “being related” expressed?

TOBLER, W. R. (1970). “A computer model simulation of urban growth in the Detroit region”. Economic Geography, 46(2): 234-240.

What is spatial correlation?

Spatial correlation can be explored in different ways.

One way is to take up an idea from time series: look at lagged correlations, and the \(h\)-scatterplot.

What is it? Plots of (or correlation between) \(Z(s)\) and \(Z(s+h)\), where \(s+h\) is \(s\), shifted by \(h\) (time distance, spatial distance).

What is spatial correlation? - \(h\)-scatterplots

What is spatial correlation?

Covariance against distance

Another way to explore spatial correlation is to plot covariances of values at point pairs against the distance between these points.

What is spatial correlation?

Covariance against distance


Group into intervals

What is spatial correlation?

Empirical covariogram


Look at means within intervals

What is spatial correlation?

Theroretical covariogram


Fit a line

From covariance to semivariance

In geostatistics the spatial correlation is modelled by the semivariogram instead of a covariogram (or correlogram). The term variogram is used synonymously with semivariogram. The (semi) variogram plots semivariance as a function of distance.

From covariance to semivariance

Covariance: \(\Cov(Z(s),Z(s+h)) = C(h) = \E[(Z(s)-m)(Z(s+h)-m)]\)

Semivariance: \(\gamma(h) = \frac{1}{2} \E[(Z(s)-Z(s+h))^2]\)

\[\E[(Z(s)-Z(s+h))^2] = \E[(Z(s))^2 + (Z(s+h))^2 -2Z(s)Z(s+h)]\]

Assume \(m=0\):

\[\E[(Z(s)-Z(s+h))^2] = \E[(Z(s))^2] + \E[(Z(s+h))^2] - 2\E[Z(s)Z(s+h)] \\ = 2\Var(Z(s)) - 2\Cov(Z(s),Z(s+h)) = 2C(0)-2C(h)\]

\(\gamma(h) = C(0)-C(h)\)

\(\gamma(h)\) is the semivariogram of \(Z(s)\).

The Variogram

  • the central tool to geostatistics
  • like a mean squares (variance) in analysis of variance, like a \(t\) to a \(t\)-test
  • measures spatial correlation
  • subject to debate: it involves modelling
  • synonymous to semivariogram, but
  • semivariance is not synonymous to variance

Variogram: how to compute

\[\hat{\gamma}(\tilde{h})=\frac{1}{2N_h}\sum_{i=1}^{N_h}(Z(s_i)-Z(s_i+h))^2 \ \ h \in \tilde{h}\]

  • average squared differences
  • divide by \(2N_h\)
  • semi variance
  • if data are not gridded, group \(N_h\) pairs \(s_i,s_i+h\) for which \(h \in \tilde{h}\), \(\tilde{h}=[h_1,h_2]\)
  • choose about 10-25 distance intervals \(\tilde{h}\), from length 0 to about on third of the area size
  • “plot” \(\tilde{h}\) at the average value of all \(h \in \tilde{h}\)

Plotting semivariance against distance


Plotting semivariance against distance

Group into intervals

The empirical variogram

Look at means within intervals

The theoretical variogram

Fit a line to the empirical variogram

Variogram: terminology

Models for variograms

Why prefer the variogram over the covariogram?

Covariance: \(\Cov(Z(s),Z(s+h)) = C(h) = \E[(Z(s)-m)(Z(s+h)-m)]\)

Semivariance:
\(\gamma(h) = \frac{1}{2} \E[(Z(s)-Z(s+h))^2]\)

\(\gamma(h)=C(0)-C(h)\)

  • tradition
  • \(C(h)\) needs (an estimate of) \(m\), \(\gamma(h)\) does not
  • \(C(0)\) may not exist (\(\infty\)!), when \(\gamma(h)\) does (e.g., Brownian motion)

Anisotropy

Some processes are directionally dependent (anisotropic), i.e. do not have identical properties in all directions. When investigating such phenomena the semivariance does not only depend on the distance between two points but also on the direction of the distance vector.

  • example: global annual mean temperature.

Isotropic (left) vs anisotropic (right) process

Check for Isotropy/Anisotropy

  • group values not only regarding distance but also direction of the distance vector
  • investigate the resulting variograms
plot(variogram(log(zinc)~1, meuse.sf, alpha=c(0,45,90,135)))

Check for Isotropy/Anisotropy

Intrinsic Stationarity

In order to be able to estimate spatial correlation from observational data, we need to assume intrinsic stationarity.

This assumes the underlying process to be a random function composed of a mean and residual

\(Z(s) = m + e(s)\)

with a constant mean

\(E(Z(s)) = m\)

and a variogram defined as

\(\gamma(h)= \frac{1}{2}E(Z(s)-Z(s+h))^2\)

This imlplies that the variance of \(Z\) is constant, and the spatial correlation of \(Z\) does not depend on location \((s)\), but only on separation distance \((h)\).

Simulation examples

(Co)Variograms and random fields

Given a theoretical (co)variogram, we can create processes (random fields) that have the desired properties.

In the following, we create different example simulations that show for an (artificial) variable how different variogram properties are associated with different spatial distributions of the values of that variable.

Recall the main characteristics of variograms

Recall the main models for variograms

Example Process: Exponential model

Example Process: Exponential model

Exponential model: Varying range

Exponential model: Varying range

Exponential model: Varying range

Exponential model: Varying range

Exponential model: Varying range

Exponential model: Varying range

Example processes: The Gaussian model

Example processes: The Gaussian model

Gaussian model: Varying sill

Gaussian model: Varying sill

Gaussian model: Varying sill

Gaussian model: Varying sill

Gaussian model: Varying nugget / sill

Gaussian model: Varying nugget / sill

Gaussian model: Varying nugget / sill

Gaussian model: Varying nugget / sill

Gaussian model: Varying nugget / sill

Gaussian model: Varying nugget / sill

Example processes: Anisotropy